Full-Stack LLM Claim Verification App

Hallucination Detection Service (HDS)

March 2026

Author: Andrew Castro

HDS is a full-stack LLM claim-verification application that analyzes generated text, breaks it into structured factual claims, retrieves supporting evidence from the Wikipedia / MediaWiki API, and returns explainable per-claim verification results.

The system combines a Python + FastAPI backend with a TypeScript + Next.js frontend to demonstrate NLP pipeline design, retrieval-aware verification, contradiction-aware scoring, and explainable AI product UX across separately deployed frontend and backend services.

Tech Stack

Backend / Verification Engine

Python + FastAPI for API routing, typed request/response handling, and verification orchestration.
spaCy + rule-based NLP for claim extraction, clause handling, subject/context resolution, and grammatical filtering.
sentence-transformers for embedding-based semantic similarity scoring between claims and retrieved evidence chunks.
Wikipedia / MediaWiki API as the external evidence source for subject-grounded and fallback claim queries.
Requests + BeautifulSoup for evidence retrieval, content parsing, and cleanup before chunking and comparison.

Frontend / Product Layer

Next.js + React + TypeScript for the interactive analyzer interface and typed client-side rendering.
Tailwind CSS for the production HDS UI styling system and claim/result presentation.
Cloudflare Pages for static frontend hosting under the custom subdomain: hds.andrewcastro.dev
Render for deploying the FastAPI backend as a separate Python web service.
REST-based JSON integration between the frontend and backend for claim analysis requests and explainable verification responses.

Key engineering decisions

Subject-first verification: Grounds named entities before attempting broad claim retrieval, which better supports hallucination-style contradiction checks.
Explainability-first UX: Exposes evidence source pages, retrieval path diagnostics, and search keywords instead of hiding model decisions behind a single score.
Conservative aggregation: Prevents a few high-confidence matches from making an entire LLM response appear fully trustworthy.

Live Application & Repo

The backend is hosted on Render's free tier, so the first analysis request may take a few moments to wake up.

Launch Live HDS Site

Visit the deployed React / Next.js frontend to analyze LLM output, inspect claim-level retrieval, and view explainable verification metadata in the production UI.

Open Live Application

Source Code & Release History

Review the full repository, backend services, frontend interface, release notes, and deployment configuration for the current HDS architecture.

View GitHub Repository

What Makes HDS Useful

LLM Output Auditing

Useful for inspecting LLM output from chatbots and identifying where support is strong, weak, contradictory, or missing.

Explainable AI UX

Shows how to present model-evaluation decisions transparently through metadata, retrieval paths, and evidence snippets instead of black-box scoring.

Portfolio Demonstration

Demonstrates backend architecture, retrieval logic, NLP preprocessing, modern frontend delivery, and cloud deployment in one integrated project.

System Workflow

Claim Extraction

spaCy parsing and rule-based filters convert multi-sentence LLM text into factual, verifiable claim candidates.

Context Resolution

Pronouns and dependent references are rewritten into retrieval-ready claims when context confidence is high enough.

Evidence Retrieval

Wikipedia pages are selected using subject-first queries, fallback claim search, title checks, and page extract cleanup.

Verification

Chunk comparison, semantic scoring, contradiction checks, and numeric/date matching generate explainable claim outcomes.

Limitations and Next Steps

HDS intentionally prioritizes inspectability over overconfident scoring. That means the UI exposes retrieval status, grounding state, subject queries, fallback search queries, contradiction reasons, source pages, and evidence text so users can understand why a result was produced. This was a major design goal because weak retrieval can easily create false positives if users only see a label and not the retrieval path behind it.

The project still has important limitations: semantic matching can over- or under-estimate support, Wikipedia is only one evidence source, and contradiction logic is still heuristic rather than full natural-language inference. Even so, the current beta version is a substantial step toward a more transparent claim-verification workflow and a stronger demonstration of full-stack software engineering.

Project Summary & Core Features

Transforms raw LLM output into claim-level verification units by filtering malformed or dependent fragments.
Builds context-aware claims through pronoun resolution and safer subject rewriting so evidence searching has stronger entity grounding.
Uses subject-first retrieval with claim fallback to separate entity grounding from evidence search and reduce weak whole-claim matching.
Scores evidence support with semantic similarity and supplements it with contradiction heuristics and numeric/date consistency checks.

Surfaces explainability fields in the UI such as retrieval status, grounding status, retrieval strategy, evidence snippets, and search keywords.
Separates verification states clearly into Supported, Contradicted, Insufficient Evidence, Retrieval Failed, and Malformed Claim tags.
Aggregates document-level summaries conservatively so a single high-scoring claim does not overstate overall support.
Demonstrates full-stack deployment architecture with a Cloudflare-hosted frontend and Render-hosted FastAPI backend.

Conclusion

HDS is a portfolio project focused on the engineering challenges behind verifying LLM-generated claims, not on pretending to be a perfect fact-checking engine. It demonstrates how to combine NLP preprocessing, retrieval, contradiction-aware scoring, typed API design, frontend explainability, and cloud deployment into a cohesive product. From a recruiter or engineering-review perspective, the value of the project is in the system design, debugging transparency, and end-to-end full-stack implementation rather than in claiming 100% factual accuracy.